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Abstract 

Capacity analysis for channels with side information at the receiver has been an 
active area of interest. This problem is well investigated for the case of hnite alphabet 
channels. However, the results are not easily generalizable to the case of continuous 
alphabet channels due to analytic difficulties inherent with continuous alphabets. 
. In the first part of this two-part paper, we address an analytical framework for ca- 

O 

pacity analysis of continuous alphabet channels with side information at the receiver. 
^ . For this purpose, we establish novel necessary and sufficient conditions for weak* con- 

tinuity and strict concavity of the mutual information. These conditions are used in 
investigating the existence and uniqueness of the capacity-achieving measures. Fur- 
thermore, we derive necessary and sufficient conditions that characterize the capacity 
value and the capacity-achieving measure for continuous alphabet channels with side 
information at the receiver. 

Index Terms 

Capacity, capacity-achieving measure, concavity, continuous alphabets, mutual in- 
formation, and optimization. 
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1 Introduction 



We consider the capacity analysis for continuous alphabet channels with side information at 
the receiver, i.e., channels where the input, output, state, and side information alphabets 
are abstract continuous spaces. For finite alphabet channels, this problem is well explored 
in the literature, e.g., [1], [2], [3], [4], [5], and [6]. However, the results for finite alphabet 
channels are not necessarily generalizable to continuous alphabet channels. 

In fact, as shown by Csiszar [7], there are some technical difficulties that must be con- 
sidered when working with continuous alphabet channels. Recall that in finite alphabet 
channels, the capacity analysis is performed over a finite dimensional space of input prob- 
ability distributions, e.g., the simplex of input probability distributions. In this case, the 
mutual information is a real-valued function over the space of input distributions. As a 
result, the capacity analysis can be conducted over the Euclidean topology. Hence, one 
can simply verify the required global and local analytical properties of the set of input dis- 
tributions and the mutual information. In contrast, for continuous alphabet channels, the 
capacity analysis needs to be conducted over the weak* topology. This requires completely 
different analytical tools and arguments that are based on machineries from measure theory 
and functional analysis. 

In the first part of this two-part study, we introduce an analytical framework for capacity 
analysis of continuous alphabet channels with side information at the receiver. From the 
practical point of view, the results of this part are useful in capacity analysis for a large 
class of channels including fading channels with side information at the receiver. In these 
channels, since the channel state (realization) changes from time-to-time, new challenges are 
imposed in capacity analysis of the channel. Moreover, according to how much knowledge 
we have about the channel state ahead of the time, one might have a range of scenarios from 
no channel state information (CSI) to full CSI, see e.g., [8], [9], [10], [11], and [12]. Hence, 
a unified analytical framework is required that enables us to tackle the capacity analysis 
for different scenarios. In the first part of this paper, we address a general framework for 
capacity analysis of continuous alphabet channels followed by applications to the multiple 
antenna channels in the second part. Specifically, in this part, we address certain analytical 
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properties of the space of input measures and the mutual information function based on 
notions from measure theory, functional analysis, and convex optimization. 

The organization of this part is as follows. A brief introduction to the problem setup 
is given in Section 2. In Section 3, we introduce an analytical treatment of the space of 
input measures and the mutual information function and address issues such as the weak* 
compactness of the space of input measures along with strict concavity and weak* continuity 
of the mutual information. In Section 4, we raise the issue of capacity analysis and address 
necessary and sufficient conditions regarding the existence, uniqueness, and the expression of 
the capacity-achieving measure. Finally, Section 5 states some concluding remarks along with 
some guidelines for future work. A brief introduction to the required analytical preliminaries 
for this paper is given in Appendix A. A detailed investigation of applications of the results 
of this part to multiple antenna channels will be provided in the second part of this two-part 
paper. 

2 Setup 

In this section, we introduce the setup for continuous alphabet channels with side informa- 
tion. We assume a discrete-time memoryless channel (DTMC) where X, Y, S, and V denote 
the input, output, state, and side information alphabets of a point-to-point communication 
channel. We assume that X, Y, S, and V are locally compact Hausdorff (LCH) spaces 
[13], e.g., alphabets are like M n (or C") which are separable [14]. Moreover, the alphabets 
are assumed to be associated with a corresponding Borel a-algebra; e.g., (X,B X ), (Y,B Y ), 
(S,Bs) are the Borel-measurable spaces denoting the input, output, and the state alphabets 
of DTMC, respectively; where Bx, By, and Bs denote the Borel a-algebras of X, Y, and S, 
respectively. The DTMC is represented by a collection of Radon probability measures [13] 
over (F, By) as follows, 

W x ,s{Y) = {W{-\x,s) e &>(Y)\ x e X, s e S}, (l) 

where s (Y) is the collection of all Radon probability measures over (Y, By). Note that the 
elements of the set Wx,s(X) are probability measures over (Y, By), that is, for each x and s, 
s) is a probability measure on (Y, By). 

3 



We assume that there exists some side information available at the receiver that is denoted 
by a measurable space {V,By) and characterized by a joint probability measure Q o R over 
Y xV. As a result, the side information is modelled by a conditional probability measure Q v 
over (S, 13 s) for every v G V. This is an appropriate model for side information, since it can 
model different scenarios. For example, one can observe that for the case of full channel state 
information (CSI), having v there is no uncertainty on S, hence Q v is just the dirac measure 
[13]. On the other hand, when there exists no side information available at the receiver, 
the probability measure Q v is some measure Q independent from v. As a result of existence 
of side information, the channel can be modelled by conditional probability measures on 
(Y, By) as follows 



Having the above channel model, an n-length block code for the channel is a pair of mappings 
(/, (ft) where / maps some finite message set M into X n and <f> maps Y n to M. The mapping 
/ is called the encoder and the image of M. under / is called the codebook. Correspondingly, 
the mapping </> is called the decoder [1]. Assuming that the channel is memoryless, the 
channel from X n to Y n is governed by probability measures 



which are conditional measures on the side information vector v = (vi,V2,-'~ i v n) £ V n . 
Since the probability measure on (V, By) is R, then the average probability of error for 
transmission of message m is defined by 



and the maximum probability of error is defined by e(W n , /, 0) = max m e(m, W n , /, 0). 
The channel coding problem is to make the message set Ai (the rate) as large as possible 
while keeping the maximum probability of error arbitrarily low, subject to some constraints 
applied to the choice of codebook. 

A non-negative rate R for the channel is an e-achievable rate, if for every 5 > and 
every sufficiently large n there exist n-length codes of rate exceeding R — 5 and probability 
of error less than e. Correspondingly, the rate R is an achievable rate if it is e-achievable for 





n 



Wg^x x • • • x E n \x) = l[W Qvi (E l \x l ), 



i=l 
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all < e < 1. The supremum of achievable rates is called the channel capacity. 

There are a number of problems that need to be addressed in capacity assessment of 
a channel: These include the capacity value and the existence, the uniqueness, and the 
characterization of the capacity-achieving input measures. In this part of this two-part 
study, we introduce a framework to address the above problems in a unified manner for 
different classes of channels. 

3 An analytical treatment 

In capacity analysis of communication channels, there are often some constraints applied 
to the transmitted signals. Commonly, this is in the form of a maximum or an average 
energy constraint [15]. A maximum energy constraint is translated into a restriction of the 
input alphabet to a bounded subset of X. 1 On the other hand, an average energy constraint 
is translated to input measures with a second moment constraint. Restriction of input 
probability measures by higher moment constraints or a combination of moment constraints 
and a bounded alphabet are also considered in practice, see e.g., [15], [16]. Since the capacity 
analysis problem is a convex optimization problem, it is of interest to know whether such 
a restricted collection of input probability measures is convex and compact (in a certain 
sense). Moreover, since we try to optimize the mutual information over such a collection, we 
need to investigate the global and local analytical behavior of the mutual information over 
the space of input measures. 

In this section, we address some analytical notions and properties of the space of input 
measures and the mutual information that are essential to the capacity analysis of continu- 
ous alphabet channels. We assume that a reader has elementary background in functional 
analysis. However, a reader can refer to Appendix A to grasp a general view of the analyt- 
ical preliminaries that are used throughout the paper. For the sake of conciseness, we only 

express the main results in this section and we address the details in Appendix A. 
1 For example, applications that use a hard-limiter power amplifier. 
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3.1 Weak* compactness of the space of input probability measures 

Let (X, Ex) be an LCH Borel-measurable space. Let ^t{X) denote the space of Radon 
measures over (X,Bx)- In probability theory, where the objects of interest are the set of 
probability measures 0*(X) C ^C(X), weak* topology, the weakest topology over M{X), is 
used to investigate the analytical properties of the functionals that are defined over &{X). 

In weak* topology, the convergence phenomenon is called weak* convergence 2 and defined 
as follows. A sequence of probability measures converges weakly*, denoted by P n ^ P if 
and only if / fdP n -> / fdP for all / G C b (X), where 

C b (X) — {/ : X — > R| / is continuous and bounded} 

denotes the set of all bounded continuous functions. 

Corresponding to the definition of weak* convergence, we have a notion of compactness 
which is called weak* compactness. That is, a family of probability measures ^a(X) C 
£P(X) is relatively weak* compact if every sequence of measures in &a{X) contains a sub- 
sequence which converges weakly* (see Appendix A) to a probability measure in the closure 
of ^a(X). 3 In general, verification of relative compactness of probability measures over an 
abstract space is not an easy task. However, for complete, separable spaces [13], there is a 
simple way to verify this property, as follows. 

A family of probability measures ^a(X) C &(X) is tight if for every e > 0, there is a 
compact set K C X such that sw£>pe& A {x) P(K C ) < e. Based on this definition, we restate 
Prokhorov's Theorem from [17]. 

Theorem 3.1 (Prokhorov's Theorem). Let 0P A {X) be a family of probability measures 
defined over the complete separable measurable space (X,Bx)- Then &a{X) is relatively 
weak* compact if and only if it is tight. 

Proof. See [17, p. 318] □ 

As a result, for X = IR n (or C n ) together with the Borel a-algebra Bx, it suffices only 

to check the hypothesis of Prokhorov's Theorem. Using Prokhorov's Theorem, [7] derived 

2 In textbooks on probability theory, the term vague is used instead of weak*. 
3 Note that the term "relative" refers to the compactness of closure. 
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the following sufficient condition for compactness of a restricted collection of probability 
measures. 

Lemma 3.1. Let g : X — > R fc be a nonnegative B or el-measurable function such that the set 
Kl = {x G X\gi(x) < Li, i — 1, • • • k} is compact for every L e M. +k . Then, the collection 



Note that Lemma 3.1 holds in general for a collection of constraints defined by positive 
functions {gi} and positive values {Tj} such that each gi satisfies the hypothesis of Lemma 
3.1. As an example of the usage of Lemma 3.1, one can consider X = W 1 along with a 
restricting function g(x) = \\x\\l and a fixed positive value T > to easily verify that the set 
of probability measures with a second moment constraint, & 9! r(X), is compact. Likewise, 
if A is a compact subset of X, one can consider 



and a fixed positive value T > to easily verify that 0P g p{X) is compact. 
3.2 Mutual information 

In this subsection, we provide conditions for weak* continuity (see Appendix A) of the mutual 
information over a set of probability measure. We also state and prove some novel conditions 
for strict concavity of the mutual information. Applications of these properties will be 
explored in the next section, where they will be used to address the existence, the uniqueness, 
and the characterization of the capacity-achieving measure for continuous alphabet channels 
with side information. 

3.2.1 Definition 

To present the precise expression of the mutual information, following [7] and [18], we first 
express the definition of informational divergence or relative entropy as follows. 




is tight and closed, and hence weak* compact for every T e 1R 



Proof. See [7, Lem. 1]. 



□ 




otherwise 



7 



For a given measurable space (X, Bx), consider two probability measures P and Q. The 
informational divergence between these two measures is [18] defined by 



D(P\\Q) A sup {^ P (£.) log2 . at gN) EiEBx disjoint, and X = \JeA. (3) 



This can be viewed as the generalization of relative entropy of probability measures of finite 
sets to the probability measures of infinite sets. By (3), it appears that if there exists an 
Ei E B x such that Q(Ei) = but F(^) ^ 0, then D(P\\Q) = oo. Thus, a necessary 
condition to have a finite relative entropy between P and Q is that for every E G Bx with 
Q(E) = 0, P(E) = 0. But this means that P is absolutely continuous with respect to Q 
denoted by P <^ Q (see Appendix A). 

By the log-sum inequality [19], it can be verified that for each partition in the right-hand 
side (RHS) of (3), consequent refined partitioning increases the value of the summation. In 
fact, as the partitions get finer, the finite sum in the RHS of (3) gets closer to D(P\\Q). This 
observation provides intuition into an important result of [18] which expresses D(P\\Q) as 



where ^ is the density of P with respect to Q [13, p. 91]. In fact, the condition P <C Q is 
a necessary and sufficient condition for the finiteness of the informational divergence as we 
show below. 

Proposition 3.1. For a pair of probability measures P and Q, D(P\\Q) < oo if and only if 
P<tlQ- Furthermore, J |log 2 \ dP < oo if and only if P <^ Q. 

Proof. The direct part of this statement is proved in [18, p. 20] which is observed by (4). 
Suppose the inverse part is not true. That is P Q, but Jlog 2 j^dP = oo. Because P is 
a finite measure, then for the set E = {x e X : ^ = oo} we must have P(E) > 0. On the 
other hand, since P(E) = J E j^dQ, this requires that Q(E) = 0. This is a contradiction to 
the hypothesis that P < Q. Using the inequality / |log 2 ||| dP < D(P\\Q) + from [18, 
p. 20], we conclude the rest of the proof. □ 

By the Lebesgue-Radon-Nickodym Theorem [13, p. 90], there exists a positive real valued 





A? 



i=l 
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function / = g such that P{E) = J E fdQ. Thus, for P < Q, 

d(piio) = y ^ io& = y /iog2 (5) 

Using the expression of relative entropy as (4) and (5), we now introduce a precise expression 
of mutual information function. 

Let (X, Bx) bs the input and (Y, By) be the output measurable spaces of a channel. The 
product space of X and Y is denoted by (X x Y, Bx <E> By), where Bx <8> -By is the Borel <r- 
algebra induced on X xY. Let P and T be two probability measures over them, respectively. 
The probability measure that is induced on (X x Y, Bx ®By) is denoted by P x T which is 
defined as follows, 

VEeB x ®B Y , (PxT)(P)= II d(PxT)= I I dPdT 

J J E J J Ey 

where for every y G Y , E y = {x G X\(x, y) G E}. 

As mentioned before, since side information is available at the receiver, the channel is 
described by probability measures Wq v (-\x) defined as in (2). For an input probability 
measure P, let the joint conditional measure of the input and output denoted by Po Wq v 
and let the marginal output measure denoted by PWq v . defined as follows. For every 
Ax B G B x x By, we have 

PoW Qv {AxB)= f W Qv (B\x)dP, 

J A 

which results into a marginal probability measure on (Y, By) such that, 

PW Qv {B) = f W Qv (B\x)dP. 

It can be verified that P o W Qv «Px PW Qv . On the other hand, P o W Qv < P x PW Qv if 
and only if Wq v (-\x) <^ PWq v P-a.e. As a result, following [7], we can express the mutual 
information of the channel as 

I(P,W Q ,\R) = JJ D(W Q .(-\x)\\PW Q JdPdR 

^/// log2 9^y dw - ( ' k)<iPdfl (6) 

where R denotes the probability measure on the space of channel state information (V, By). 
To emphasize that the mutual information is a function over & g p{X), we deliberately use a 
different notation for it (as in [1]) rather than the more common notation expressed in terms 
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of random variables [19]. In the following subsections, we investigate some global and local 
analytical properties of the mutual information (6) such as concavity and continuity. 



3.2.2 Convexity and concavity 

In this part, we address some global analytical properties of the mutual information. For this 
purpose, we first study these global properties of the relative entropy and then we generalize 
them for mutual information. 

The convexity of relative entropy with respect to the convex combination of a pair of 
measures is well known [18]. However, to the best of our knowledge, necessary and sufficient 
conditions for its strict convexity were not known before. This is of particular interest to 
show the uniqueness of the capacity-achieving measure, as it will be shown later. Hence, in 
the following theorem, we state necessary and sufficient conditions for the strict convexity 
of relative entropy. 

Theorem 3.2. D(P\\Q) is convex with respect to the pair (P,Q). That is, for given pairs 
(Pi, Qi) and (P 2 , Q 2 ) and given scalar < a < 1, 

D(aP 1 + (1 - a)P 2 \\aQ 1 + (1 - a)Q 2 ) < aD(P 1 \\Q 1 ) + (1 - a)D{P 2 \\Q 2 ). 

Moreover, the inequality is strict if and only if there exists a set E E Bx such that ^ 
7^ on E and for all nonempty Borel-measurable F C E, F E Bx, Pi(F) ^ and 
P2(F) ± 0. 

Proof. For convenience in derivations, let f3 — 1 — a. Then, it can be verified that Q\ <C 
aQi + (3Q 2 and Q 2 <C aQi + f3Q 2 . Let g\ and g 2 denote the density functions of Q\ and 
Q 2 with respect to aQi + j3Q 2) respectively. That is dQi = g\d(aQi + (3Q 2 ) and dQ 2 = 
g 2 d(aQi + f3Q 2 ). Note that ctg x + f3g 2 = 1. Since P ± <^ Qi and P 2 Q 2 associated with 
density functions A = ^ and f 2 = j^, then aP 1 + (3P 2 < aQi + /3Q 2 and d(aP 1 + f3P 2 ) = 
(afm + f3f 2 g 2 )d{aQ l + 0Q 2 ). Thus, 

D(aP l + pP-zWaQi + (3Q 2 ) = J (af l9l + (3f 2 g 2 ) \og 2 (af l9l + PhgMaQ, + 0Q 2 ) 

<a J log 2 fxdQi + [3 J f 2 log 2 f 2 dQ 2 (Log-sum inequality) 
= aD(P 1 \\Q 1 ) + PD(P 2 \\Q 2 ). 
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For strictness of the inequality, note that in log-sum inequality, for x G X, strict inequality 
occurs if fi(x) 7^ f2(x) 7^ and gi(x) 7^ 0,g 2 (x) 7^ 0. Let N denote the maximal null set of 
aQi + (3Q 2 , then define 



This set is Borel measurable since /1, /2, <7i, 92 are Borel measurable. To have strict inequality, 
we need E such that (aQi+(3Q 2 )(E) 7^ 0. Because, g\ and g 2 are nonzero over E, Qi(E) 7^ 
and Q 2 (E) 7^ 0. Since f 1 and f 2 are also nonzero, P\{E) 7^ and P 2 (E) 7^ 0. For every 
nonempty Borel-measurable subset F C E, the above argument holds. This proves the direct 
part of the assertion. 

On the other hand, suppose there exists E G Bx with the above definitions such that 
for every nonempty Borel-measurable F C E, Pi{F) 7^ 0, P 2 {F) 7^ 0, and f\ 7^ f 2 over 
F. Let Ki = {x G X\iV : gi(x) ^ 0} for % = 1,2. It is clear that both £nifi^0and 
£ fl K 2 ^ 0, otherwise either P\(E) = or P 2 (E) = which is a contradiction to our 
hypothesis. This means that (E n .£Q) C £ is a proper subset of and by hypothesis, 
Pi(E n Kj) ^ (i, j G {1,2}). This implies that (E n Xi) n(£fl K 2 ) ^ 0- By definition 
of (E n Xi) n(£fl AT 2 ), we deduce that (aQi + f3Q 2 ){E n^n AT 2 ) ^ 0. Thus for the set 
E H KiH K 2 log-sum inequality holds strictly. Hence, the inequality would be strict. This 
concludes the proof. □ 

As an special case of Theorem 3.2, we obtain the following corollary. 

Corollary 3.1. If Q — Q 1 — Q 2 in Theorem 3.2, then the convexity is strict if and only if 
there exists a set 



such that Q(K) > 0. 

Proof. From Theorem 3.2, the strict inequlity holds if and only if there exists E G Bx such 
that ^ 7^ g 7^ on E and for every proper F C E G B x , Pi{F) > and P 2 (F) > 0. 
Taking a nonempty K C E, the direct part of the assertion is proved. 

For the reverse part, suppose there exists a set K as in the hypothesis. Let N be the 
maximal null set of Q and let E = K\N. Now, it can be verified that for any proper 



E = {xe X\N : h(x) ^ f 2 (x) ^ 0, 9l (x) ^ 0, g 2 (x) ^ 0}. 
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Borel-measurable subset F C E, we have Pi(F) > and P2(F) > 0. This proves the reverse 
direction of the assertion. □ 



Now, we use Theorem 3.2 and Corollary 3.1 to establish a proposition on global properties 
of the mutual information. This can be considered as a generalization of a similar result in 
[7] for channels with side information. However, we provide a rigorous proof for this more 
general proposition, since later in the paper, we use some of the intermediate results. 

Proposition 3.2. The mutual information (6) is concave with respect to P, convex with 
respect to Wq v {-\x), and linear with respect to R. 4 

Proof. The linearity with respect to R is clearly seen by (6). The convexity with respect 
to W / Q i ,(-|a;) follows by the convexity of D(Wq v (-\x)\\PWq v ) which can be verified by Theo- 
rem 3.2. 

To prove the concavity with respect to the input distribution P, let < a < 1, f3 — 1 — a, 
and P = aPi + j3P 2 . By linearity, this implies that PW Qv = aP 1 W Qv + j3P 2 W Qv . Pick an 
auxiliary probability measures T v (conditional on v) over Y such that PWq v <C T v ; the 
existence of such a measure is obvious. Since, Wq v (-\x) <C PWq v and PWq v <C T v , then 
Wq v (-\x) T v . By Proposition 3.1, we also know that D(PWq v \\T v ) < oo. Let us consider 
the mutual information for a fixed value v, and denote it by I(P, Wq v ). As a result, we can 
expand it as 

up, w Qv ) = ||io g2 ^M dw Q ,(-\ X )dP 

Now, we can use Fubini's theorem to change the order of integration in the second term and 
4 Note that concavity, convexity, and linearity are with respect to the convex combination of the operands. 
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apply Theorem 3.2 to obtain: 

I(P,W Q ,) = f J \og / WQ d ^ X) dW Q M*)dP - f \og 2 A(P ^- ] d[PW Q ,) (7) 

+ p jj^^^m Q M^-pj^ d ^^d(p 2 w Q ,) 

Noting that P\Wq v <C T v , P?Wq v <C T v and using the above arguments, we can contract the 
RHS to obtain 

I(P, W Qv ) > aI(P u W Qv ) + (3I(P 2 , W Qv ). 

Because, this holds for every v, we can integrate both sides of the above equation with 
respect to R and deduce that 

I(P, W Qv \R) > aI(P u W Qv \R) + (3I(P 2 , W Qv \R). 

This concludes the proof. □ 

Proposition 3.2 addresses the concavity of the mutual information with respect to input 
measures. In the following proposition, we address its strictness. 



Proposition 3.3 (Strictness). The mutual information is strictly concave with respect to 
the input measure if and only if the set 

d(PiW Qv ) d(P 2 W Qv ) 
d{PW Qv ) * d(PW Qv ) 

has {PW Qv x R)(E) > 0. Moreover, if T v is a conditional probability measure on Y such 
that PWq v <C T v for all v G V , then strict concavity holds if and only if the set 
E -ff.. ^ r v v v . d{P x W Q9 ) u d(P 2 W Q 



E-^y,v)EYxV. + + Oj 



dT v 

has nonzero measure with respect to the product measure T v x R. 

Proof. The proof follows from considering the proof of Proposition 3.2 together with Corol- 
lary 3.1. For a fixed v, by Corollary 3.1 if there exists a set E v such that ^(pwq^ 7^ 
^p^") 7^ and PWq v (E v ) > 0, then strictness holds. To have strictness in total, we need 
to have it for .R- almost everywhere. The proof of the special case is immediate by definition 
of E. □ 
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This concludes our discussion on convexity and concavity properties of the mutual infor- 
mation. 

3.2.3 Continuity 

So far, we have discussed the compactness of the set of input probability measures and some 
global properties of the mutual information. In this subsection, we discuss the continuity 
of the mutual information in the sense of weak* topology. However, before expressing the 
main result of this part, let us introduce a useful inequality. 

Lemma 3.2. For a channel with side information as specified by Wq v (-\x) (2), let 

\I\(P,W Qv \R) A HI 
Then, the following inequalities hold 

I(P,W Qv \R) < \I\(P,W Qv \R) < I(P,W Qv \R) + 



log; 



dW Qv (-\x) 
d(PW Qv ) 



dW Qv (-\x)dPdR. 



Proof. The first inequality is obvious. The second inequality follows from a simple obser- 
vation that —7^2 — X ^°&2 X - As a result, we have |xlog 2 x| < x\og 2 x + Using this 
observation, the proof of the second inequality follows. □ 

We now state and prove a novel sufficient condition for the continuity of mutual infor- 
mation. 

Theorem 3.3. Consider a channel with side information which is described by Wq v {-\x), 
together with a closed collection of input probability measures 0*a(X). Suppose there exists 
a measure T on (Y,By) such that Wq v (-\x) T and density function f Tt Q v (y\x) = dWQ ^'^ , 
// 

a. The function f T ,Q v {y\ x ) is continuous over X xY x V, and f'T,Q v {y\ x ) l°g2 fT,Q v (v\ x ) 
is uniformly integrable over {T x P x R \ P e &a{X)}. 

b. For fixed y and v, the function fT,Q v (y\ x ) is uniformly integrable over &a{X). 

Then, the mutual information function is bounded and weak* continuous over 0^ A [X). 
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Proof. To show the continuity of I(P,Wq v \R), we need to show that for every sequence 
P n ^ P, we have I(P n , Wq v \R) — > I(P, Wq v \R). For this purpose, using Proposition 3.1, 
similar to the proof of Proposition 3.2, we decompose the conditional mutual information 
into two terms. 

I(P,W Qv \R) = J!! log 2 ^^dW Qv (-\x)dPdR 

log 2 d ^^dW M PdR - 1 1 log 2 d -^d(PW Qv )dR 

f T , Qv (y\x) log 2 f T , Q Mx)dTdPdR -II fr,p,Q v (y) log 2 f TtP , Qv (y)dTdR. 

Momentarily, we assume that both terms are finite, then we provide evidence for this as- 
sumption. Thus, we need only to show that both terms are bounded and continuous over 

Continuity of the first term: Since P n P, by Proposition A. 2, we have T x P n x R^ 
TxPxR. Because f T ,Q v (v\ x ) is continuous, so is f TtQv (y\x) log 2 f TtQv (y\x) . By hypothesis, 
fr,Q v (y\x) log 2 fr,Q v (y\x) is uniformly integrable over {T x P x R \ P e ^ A {X)} (Definition 
A. 2). Therefore, using Theorem A. 2, we deduce that 

f T , Qv (y\x) lo S 2 f T , Q Mx)dTdP n dR ^111 f T , Qv (y\x) log 2 M Qv (y\x)dTdPdR. 

This proves the continuity of the first term. The finiteness of the first term is immediate by 
the uniform integrability property. 

Continuity of the second term: For fixed y and v, since fr,Q v (y\x) is uniformly integrable 
over £Pa(X), by Theorem A. 2, we deduce that P n ^ P implies the pointwise convergence 
°f fT,p n ,Q v (y) fr,p,Q v (y)- By continuity of the log 2 , we deduce the pointwise conver- 
gence of f T ,p n ,Q v (y) lo S2 fr,p n ,Q v (y) -»• fr,p,Q v (y) lo g2 fr,p,Q v (y)- lt onl y remains to show the 
convergence of their integrals with respect to T x R. For this purpose, we proceed as follows. 

By Lemma 3.2 and its proof along with the log-sum inequality, for every n, 

2 

\k,P n ,Q v {y) lo S2 k,P n ,Q v {y)\ < -^7, + fT,P n ,Q v (y) lo S2 fT,P n ,Q v (y) 



- eln2 4 



I fT,Q v {y\x) !og 2 fT,Q v (y\x)dP„. 

But, we have already shown that the integration of the RHS over TxR leads to a convergent 
sequence of integrals. Thus, by the generalized Dominated Convergence Theorem [13, p. 59], 
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we deduce that 

J J fT, Pn ,Q v (y) lo g 2 fT,p n , Q M dTd R -»• J J Mp,qM ^g 2 f T ,p, Qv (y)dTdR 

This implies the continuity of the second term. Note that its finiteness is obvious. Since 
both terms are finite and continuous, we deduce the continuity of mutual information. This 
concludes the proof. □ 

So far, we have discussed the conditions for compactness of the set of input probability 
measures and the strict concavity and continuity of the mutual information. The following 
section demonstrates the application of these results for capacity analysis purposes. 



4 Capacity analysis 

In this section, we address the capacity analysis for continuous alphabet channels with side 
information. We provide a coding and converse coding argument for the capacity value of 
the channels of our interest, and we address the existence, the uniqueness, and the charac- 
terization of the capacity-achieving input measure. 

4.1 Channel capacity 

Consider the channel of interest described by Wq v (-\x). Let g : X — > M, k be a nonnegative 
Borel-measurable function that satisfies the hypothesis of Lemma 3.1. Let T e M +fe and 
2?gp{X) defined as in Lemma 3.1. We show that 

C= sup I(P,W Qv \R) (8) 

Pet? B , r (x) 

is the capacity of the channel. For this purpose, we use the results of [7] to express the 
coding and converse coding theorem for the case of continuous alphabet channels with side 
information. 

Lemma 4.1 (Converse Coding Lemma). Consider a collection of probability measures 
£? g p{X) on X. For any 5 > 0, there exists no and e > such that for every code (/, 0) of 
length n > no with N codewords whose empirical measures all belong to & g p(X), if 

-\og 2 N> sup I(P,W Qv \R) + 5, 

n Pe.s? g , r (x) 
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then the maximum error probability satisfies e(W n , /, <fi) > e. 

Proof. The proof follows from [7, Lemma 6]. We note that, here, the channel has only 
one strategy and we can consider Y x V as the output alphabet of our channel. Since 
V is independent of the input, we can simplify the results of [7, Lemma 6] to obtain our 
assertion. □ 

By Lemma 4.1, we can easily verify that any rate R > C is not achievable. Suppose not, 
i.e., suppose R > C is achievable. That is for every 5 > and e > there exists n such 
that for every n > no, there exists a code with at least |~2 n (' R ~ <5 )] and error probability less 
than e. But this is a contradiction to the assertion of Lemma 4.1. 

Now, inspired by [7, Thm. 1] , we state the coding theorem. 

Theorem 4.1 (Coding Theorem). For every positive number 5, there exists an integer n 
and 7 > such that for block length n > n for any prescribed codeword type P G @> g p{X) 
there exists a code with N codewords, each of type P, such that 

- log 2 N > I(P, W Qv \R) - S and e(W", /, 0) < 2~ n \ 



Proof. The proof is by [7, Thm. 1]. First consider Y x V as the output alphabet of the 
channel. Then, noting that the CSI, V, is independent from the input, we can simplify the 
results of [7, Thm. 1] to obtain our assertion. □ 

Since the result of Theorem 4.1 holds for every input measure, it holds for their supremum. 
Hence, we can deduce that for every 5 > and sufficient large block length, there exist codes 
with rate R > sup Pe <^ 9 r( - X ) I(P, Wq v \R) — S. Because this is true for every S > 0, using the 
Converse Coding Lemma, we deduce that the channel capacity is 

C= sup I(P,W Qv \R). 

4.2 Existence 

In this subsection, we give a sufficient condition for the existence of an optimal input measure, 
say P Q , such that the capacity is achievable by some code with codewords of type P a . 
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Proposition 4.1. Let 0P A (X) denote a weak* compact collection of probability measures on 
(X, Bx) and let the channel be described by Wq v {-\x). If the mutual information I(P, Wq v \R) 
is continuous over ^a(X), then it is bounded and achieves its maximum on &a{X). 

Proof. We claim that the range of I(P,Wq v \R) is bounded. Suppose not. Then, for every 
n E N, there exists P n € ^a(X) such that I(P n , Wq v \R) > n. But the sequence (-P n )^Li 
belongs to &a{X) which is a weak* compact family. By definition this means that there 
exists a weak* convergent subsequence P rik ^ P. By closedness of £P A (X), we know that 
P E &a(X), hence I(P,W Qv \R) is finite. This is a contradiction to I(P n ,W Qv \R) > n. 
Thus, the range of mutual information function is bounded. 

Since the range of mutual information is bounded, it has a supremum. Let us denote this 
supremum value by M. By definition of supremum, for every n, there exists P n such that 
I(P n , Wq v \R) > M — i. By weak* compactness of ^a(X), there exists a weak* convergent 
subsequence P nk ^ P. By continuity of I(P,W Qv \R), lim fc I(P nk , W Qv \R) -> I(P,W Qv \R). 
This requires that M = I(P, Wq v \R) which means that the maximum is achieved by P. □ 

Since ^ g ^{X) is weak* compact and I(P, Wq v \R) is continuous over ^ g ^{X), by Propo- 
sition 4.1, there exists a capacity-achieving measure in P Q e & g p{X). In the next subsection, 
we address a condition for the uniqueness of the capacity-achieving measure. 

4.3 Uniqueness 

In this subsection, we address sufficient conditions for the uniqueness of the capacity- 
achieving measure, a topic that that is of interest both from practical and theoretical stand- 
points. 

Proposition 4.2. Suppose & A {X) is a convex set of input measures and Wq v (-\x) denotes a 
channel with side information. Assuming the existence of a capacity- achieving input measure 
P Q , it is unique upon the satisfaction of the hypothesis of Proposition 3.3. 

Proof. Suppose there exists another input measure P* G ^a(X) that achieves the capacity, 
also. For P Q and P*, if the hypothesis of Proposition 3.3 is satisfied, then their convex 
combination achieves a higher mutual information, which is a contradiction. □ 
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4.4 Characterization 

Now, we show how to characterize the capacity-achieving probability measure. Let g : X — > 
R k be a continuous positive function that satisfies the hypothesis of Lemma 3.1, and let G G 
R +k . By Lemma 3.1, the set of probability measures ^ g ^{X) is weak* compact. Moreover, 
since the functionals f gidP are linear over the space of probability measures, the constraints 
J gidP < Ti make & g ^{X) a convex set. Suppose that the mutual information function is 
weak* continuous over &> g p{X). By Proposition 4.1, the mutual information function assume 
its maximum over S^ g ^[X\ The problem is how to characterize this measure. 

To characterize the capacity-achieving measure, we use the global theory of constrained 
optimization [20] which uses Lagrange multipliers to facilitate the optimization problem. 
Applying the results of [20, p. 217], we obtain the following result. 

Lemma 4.2. Let C = sup^ r( - x - ) I(P, Wq v \R). Then, there exists an element 7 G M +fc such 
that 

C = sup L(P,W Qv \R)-J2~fi( J QidP-T^j : for all P G ^,r(X)J. 

Furthermore, this supremum is achieved by a probability measure P* G S^ g ^[X) such that 
7i / gidP = for i = !,-■■ , k. 

Proof. It suffices to show that our optimization problem satisfies the hypothesis of [20, 
Theorem 1, p. 217]. Here, we have ^ g ^{X) as the convex space we are optimizing over, 
J gidP as the convex constraint functions, and the mutual information is a concave function 
where its negative is our objective function over 3^ g ^{X). As we have discussed before, 
^P g> r{X) is a nonempty, weak* compact, and convex set. Since mutual information is weak* 
continuous over it, C is finite. By Theorem 1 in [20, p. 217], we deduce that there exists 7 > 
that satisfies the hypothesis. This concludes the proof of the first assertion. Moreover, since 
mutual information achieves its maximum over & g p(X), the second assertion holds. □ 

To obtain the optimum probability measure in Lemma 4.2, we need some simplifying 
necessary and sufficient conditions which we define as follows. Let 

f(P)±I(P,W Qv \R)-J2n( [ 9idP-T t ). (9) 

i=i "* 
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It can be seen that for every P E ^^(X), f(P) < oo. This comes from the finiteness of 
both terms. Note that the second term is finite by definition of ^ g ^{X), and the finiteness 
of the first term follows from Proposition 4.1. The weak* continuity and the concavity of 
(9) follows, similarly. By definition of Gateaux differential [20, p. 171], if for 6 e [0,1], the 
limit 

6f(P ,P) 4 Bm J[/(0P+ (1 - 9)P a ) - f(P )}. (10) 

exists, then we call it the differentiation of / at P Q with increment of P. If (10) exists for 
all P G ^ Si r(X), we say that / is differentiable at P a . We state and prove the following 
theorem. 

Theorem 4.2. The supremum of f is obtained by P Q e & 9t r(X) if and only if f is differ- 
entiable at P and Sf(P , P) < for every P £ &> 9tT (X). 

Proof. To prove the necessity, take any P e ^ r (I). For < 9 < 1, let P e = 9P+(1-0)P O . 
By convexity of £P g p{X), P e G ^r(^)- Since / attains its supremum on P Q , then f(P$) < 
f(P a ) which implies that /(P6,) / (Fo) < 0. This implies that Sf(P o ,P) < upon its existence. 

Moreover, since / is a concave function with respect to P, we know that 9f(P) + (1 — 
0)f(Po) < f(Pe)- This implies that 

f(P)-f(Po)<- 9 [f(Pe)-f(P )}. 

Since both f(P) and /(P ) are finite, h[f(Pe) ~ f(Po)] is bounded below for all values of 
9. Since 9 — > implies that ^ P G , by weak* continuity of /, we have f(P$) — > f(P )- 
Moreover, ^[/(Pe) — /(P )] is bounded, then the existence of its deleted limit at # = is 
immediate [14, p. 175]. Therefore, for all Pe ^, r (I), Sf(P ,P) exists and Sf(P ,P) < 0. 
This concludes the proof of the necessity. 

To prove sufficiency, we proceed by contradiction. Suppose the assertion is not true. 
That is, there exists a probability measure P* such that f(P*) > f(P )- By concavity of /, 
we would have 

f(9P + (1 - 9)P*) > 9f(P ) + (1 - 0)f(P*) > f(P ) 
which creates a contradiction to non-positiveness of the differentiation. □ 

To characterize the capacity-achieving probability measure P c , by Theorem 4.2, it suffices 
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to check the sign of Sf(P o , P) for all P e SPg^iX). Recalling the finiteness of the following 
terms, one can easily verify that 

k 

5f(P o ,P)= 1 1 \d(W Qv (-\x)\\P W Qv ) -Y^ li9i {x)\dPdR 
•* •* i=i 

k 

- f f [D{W Q X-\x)\\P W Qv ) -Y^ ligi {x)\dP dR 
J J i=i 
Noting that P Q is the capacity-achieving measure, by Theorem 4.2 this means that 

/ / [D(W Qv (-\x)\\P W Qv ) -Y,H9i{x)\dPdR < C -J^^ ( n ) 
•* J i=i i=i 

for all FG ^ 7 (X). The following result simplifies this condition. 

Theorem 4.3 (Kuhn- Tucker conditions). The capacity- achieving measure is P a if and 
only if there exists 7 > such that 

VxGX, f D{W Qv {-\x)\\P W Qv )dR-Y J H9i{x) <C - Y.l^i (12) 
J i=i 1=1 

where the equality holds for P -almost everywhere. 

Proof. The inverse part can be verified immediately from Theorem 4.2 and (11). For the 
direct part, since P is arbitrary, we can take P as dirac measures in different points, which 
results in the asserted inequality. By (11) and Theorem 4.2 we conclude the optimality of 
P Q . For the rest of the assertion, suppose that it is not true. That is, there exists a set 
E G Bx such that P (E) > 0. Now taking the integration of LHS of (12) and decomposing 
the integration over E and E c , one can verify that this assumption leads to the inequality 
C — Yli=i < C — Yli=i which is a contradiction. □ 

Theorem 4.3 provides the necessary and sufficient conditions for the capacity- achieving 
measure in its most general form for continuous alphabet channels with side information at 
the receiver. Similar results are known for finite alphabet channels [1], [21] and [22]. For 
these channels, systematic algorithms are known to find the capacity-achieving measure [22]. 
In contrast, such algorithms are not known for continuous alphabet channels. However, one 
might be able to find the solution of Theorem 4.3 for special classes of channels. 

Because of the importance of Theorem 4.3, let us rephrase the assertion of Theorem 4.3 
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more intuitively. For a given probability measure P on (X,Bx), the support is defined as 

S X (P) = {x G X\ V open U E B x that contains x, P{U) > 0}. 

The capacity-achieving measure is such that the equality in (12) occurs if and only if x E 
Sx(P). 

In an effort to characterize the support of the capacity-achieving measure, suppose X = 
C n and let define p : X -> R as 

n n k k 

p(x) 4 / / [£>(Wg.(.|x)||P WoJ - ^ 7 ift(aj)]dftifl + X)^ r * " C ~ ^ 
J J i=i i=i 

Let Z = C 2n and consider the extension p : Z — > C by replacing Re(xj) = Zj and Im(xj) = 
z n+ i, corresponding to a natural embedding £ : X — > Z. This means that p(z) is real- 
valued for z G IZp(Z), where 1Z P (Z) denotes the range of p. For every set U C Z, let 
X{/ = f- 1 ^ n ^ P (Z)) denote the inverse image of [/ under ^. Using the properties of 
analytic functions [23] , we state and prove the following proposition. 

Proposition 4.3. Let p(z) be analytic on an open set U C Z, and let Xy be the inverse 
image of U under £. If Sx{P ) H /ias an interior point, then Xy C «Sx(-P )- 

Proof. Suppose <Sx(-P ) H X{/ has an interior point, say for example x Q . Then, there exists 
an e > and an open ball of radius e centered at x Q , B e (x ), such that B t (x ) C Sx(P )- 
This means that the p(x) = on B e (x ), and consequently p(z) = on £(B e (x )) fl U. Let 
z = C(xq). Since is analytic on z Q G U, there exists an open ball B r (z ) G [/ (for some 
r > 0) such that p(z) can be represented as a Taylor series expansion on B r (z ) [24]. Since 
p(x) = on B e (x ), the coefficients of the Taylor expansion are all zero. This implies that 
p(z) = on B r (z ). By Uniqueness Theorem [24, p. 12], [23], we conclude that p(z) = on 
U. This means that p(x) = on Xu which implies that Xu C Sx(P )- □ 

By Proposition 4.3, one can verify that if for some channel, the function p(z) is analytic 
on Z, then either the support includes no interior point or it is equal to X. 

This concludes our discussion on capacity-analysis of continuous alphabet channels with 
side information. In Part II of this two-part paper, we use this framework to study the 
capacity analysis problem for multiple antenna channels. 
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5 Conclusion 



In this part, we established a general analytical framework for capacity analysis of continuous 
alphabet channels with side information (at the receiver). We studied the mutual information 
of these channels along with some of its analytical properties such as strict concavity and 
continuity. We established novel necessary and sufficient conditions for strict concavity 
and continuity of the mutual information in the weak* topology. We used these results and 
addressed issues regarding the existence, uniqueness, and the expression of capacity-achieving 
measure. 

The results of this work can be used for capacity analysis of different classes of channels. 
Specifically, as will be shown in the Part II of this paper, these results are useful for capacity 
assessment of multiple antenna fading channels, fast or slow, Rician or Rayleigh, with partial 
or no CSI at the receiver, where the input probability measure could be subject to any 
combination of moment constraints. 

Appendix 

A Preliminaries 

In this appendix, we discuss some analytical notions and properties that are used throughout 
this paper. Some of these results are new while others are the review of the previous work, 
which we restate them here for the sake of completeness. 

A.l Weak* topology 

Let (X, Bx) be an LCH Borel-measurable space. The weak* topology is defined as follows. 
Let Cq{X) denote the space of continuous functions from X to 1 which vanish at infinity, 
i.e., 

Co(X) — {/ : X — > R| / is continuous and it vanishes at infinity}. 

By the Riesz representation Theorem [13], the dual space of C (X) is isomorphic to the 
space of Radon measures Jt{X) over the measurable space (X, B x ) [13]. To study the effect 
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of an operation over ^C(X), there are different topologies that can be considered on M[X). 
The only crucial requirement is that the topology should be well behaved with respect to 
the operation of interest. In probability theory, where the objects of interest are the set of 
probability measures &{X) C ^#(X), weak* topology is used which is the weakest topology 
on M{X) defined as follows. For each / G C (X), and every open set GCM, let 



The collection of all subsets U(f, G) C M{X) forms a basis for weak* topology on J{{X). 
The collection of all subsets which are formed by any arbitrary union or finite intersections 
of the basis subsets form the weak* topology. 

A. 2 Convergence 

In weak* topology, the convergence phenomenon is called weak* convergence 5 and defined as 
follows. A sequence of probability measures converges weakly*, denoted by P n ^> P if and 
only if J fdP n — > J fdP for all / G C (X) [13]. Since our focus is on probability measures 
0f(X) C ^C(X), where all measures have unit norm, this is equivalent to saying that a 
sequence of probability measures converges weakly*, P n ^> P, if and only if f fdP n — > f fdP 
for / G C b (X), where 



denotes the set of all bounded continuous functions. 

Given two measures v and \i over (X,Bx), v is said to be absolutely continuous with 
respect to /i denoted by v <C /i, if for every E G Ex such that fi(E) = 0, with v(E) = 0. By 
the Lebesgue-Radon-Nickodym theorem [13], there exists a ^-integrable function / such that 



and is called the density (Radon-Nikodym derivatives) of v with respect to fx, denoted by 
/ = As an example of a sequence of probability measures which is weak* convergent, let 
us consider the following proposition. 

Proposition A.l. Let (P n ) be a sequence of probability measures which are absolutely con- 
5 In textbooks on probability theory, the term vague is used instead of weak*. 




Cb(X) — {/ : X — > R| / is continuous and bounded} 



for every E G Ex, v{E) = f E fdfi. The function / is unique //-almost everywhere (//-a.e.) 
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tinuous with respect to some measure [i (e.g. Lebesgue measure). For each n, let f n = ^p 1 
denote the density of P n with respect to fj,, and let f be a function such that f n — > / fi-a.e. 
and J fd/j, = 1. Then, P n P, where P is the probability measure defined as P{E) = J E fdfj, 
for every E G Ex- Moreover, for every E G Ex, P[E) = lim n P n (E). 

Proof. Because {f n } are density functions for probability measures {P n } with respect to fi, 
we have J f n dfi = 1. By Fatou's lemma, for every E G Ex 

P{E)= I fdfi< liminf / f n dfx = liminf P n (E). 
Je n Je 

By [17, p. 311], this implies the weak* convergence. Moreover, noting that j E fdn + 
f EC fdfj, = f E f n djj, + J EC fdjj, = 1, we deduce that 

fdfi = lim / f n d/i. 
e n Je 

This concludes the second part of the assertion. □ 

To establish some of our results in this paper, it is of interest to verify whether the weak* 
convergence of a sequence of measures on one of these spaces implies the weak* convergence 
on the sequence of product measures. The following proposition is quite useful for this 
purpose. 

Proposition A. 2. Let (P n ) be a sequence of probability measures on (X,Ex) and let T be 
a probability measure on (Y, Ey). Then, P n ^-> P implies (P n x T) —> (P x T). 

Proof. For every open E G Ex <8> -By, let E y be as defined before. It is obvious that, for each 
y, E y is an open set in Ex- Therefore, 

(PxT)(E)= [[ d(PxT) 



J JE 

= J P{E y )dT (By Tonelli's Theorem) 

< J liminf P n {E y )dT ([17, p. 311]) 

< liminf / P n (E y )dT (Fatou's lemma) 
n J 

< lim inf (P n x T) (E) (By Tonelli's Theorem) . 

n 

By [17, p. 311], this implies (P n x T) ^ (PxT) and concludes the proof. □ 
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Note that this can be also generalized for products of higher order. After this brief 
introduction to some necessary properties on convergence of probability measures, we now 
proceed to discuss the convergence of integrals, which is used to prove the continuity of 
mutual information. 

A. 3 Uniform integrability 

Some common sufficient conditions for convergence of a sequence of integrals are the mono- 
tone convergence theorem (MCT), the dominated convergence theorem (DCT), and the 
generalized dominated convergence theorem (GDCT) [13]. However, in this paper, we face 
a sequence of integrals whose convergence is not verifiable by any of these conditions. For 
our purposes, a less common condition exists known as uniform integrability. 

Recalling that Radon probability measures are regular [13], i.e., for every e > 0, there 
exists a compact subset K e B x such that P{K) > 1 — e, we express the following definition. 

Definition A.l. Let P e £P(X). A collection of functions {f a }aeA is called uniformly 
P-integrable if 



where E a (c) = {x e X\ \f a \ > c}. 

A more general definition of uniform integrability for positive measures is perhaps more 
familiar. However, we emphasize that Definition A. 1 is an equivalent statement to the more 
general statement in the case of finite measures. We refer an interested reader for more details 
to [13, p. 92] and [17]. In the following theorem, we show that the sequence of integrals of 
a pointwise convergent sequence of uniformly P-integrable functions is converging. 

Theorem A.l. Let P G £P(X) and let {f a }aeA be uniformly P-integrable. Let (/„) be a 
sequence from {f a }aeA such that f n —*f P-almost everywhere (P-a.e.). Then, f is integrable, 




J f n dP - / fdP, and J\f n -f\dP^0. 
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For every set E, let xe denote its characteristic function. Let g H)C = f n XE^(c)- Since f n —>f 
P-a.e., then g nc — > g c P-a.e. Because |^„ jC (x)| < c for all x and n, by DCT, we have 
/ g n>c dP-^ j g c dP. That is 

Ve > 0, 3N such that 

Now, by the triangular inequality 



f n dP- / fdP 

Ec( c ) JeHc) 



< - for n > N. 
~3 



f n dP- / /dP 



< 



fndP 



E n (c) 



fndfJ,- / fdfJ, 
Ec(c) JE-{c) 



+ 



fdP 



E{c) 



tee 
£ 3 + 3 + 3 =E 



This means that J f n dP — > J fdP. To prove the other part of the assertion, we recall that 
since \f n -f\< \f n \ + l/l, by GDCT it follows that / \f n -f\dP^0. □ 

Another common scenario that arises in the context of convergence of integrals is the 
case that we have a fixed integrand function but a sequence of probability measures. To deal 
with such scenario, let us establish the following definition. 

Definition A. 2. Let &a{X) be a collection of probability measures over (X, Ex)- A function 
f is called uniformly integrable over if 

sup / \f\dP— > 0, as c — > 00 
Pe& A (x) Je{c) 

where E(c) = {x e X\ \f \ > c}. 



Using Definition A. 2, we state and prove a sufficient condition for the convergence of the 
sequence of integrals of a function with respect to a weak* convergent sequence of probability 
measures. 

Theorem A. 2. Let &a(X) be a closed collection of probability measures and let (P n ) be a 
weak* convergent sequence in it. If f is a continuous function and uniformly integrable over 
{P n }, then J fdP n ^ j fdP 

Proof. For every c > 0, let E(c) = {x G X\ \ f\ > c} and Xe{c) be its characteristic function. 
By definition of uniform integrability of / over {P n }, 



Ve > 0, n, 3 c t > such that 



/ 



fdP n < - for c > c e . 
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Let g c = fxE c (c) + cxe(c)- Continuity of / over X implies the continuity of g c over X. By 
weak* continuity of {P n }, 



Ve > 0, 3N such that 
By the triangular inequality, 



J g c dP n - J 



9cdP 



< -, for n > N. 
~3' 



fdP n - / fdP 



< 



< 



+ 



g c dP n - / g c dP 



+ 



J if- 9c)dP 



+ 




+ 


/ fdP 








JE(c) 



J if- 9c)dP n 

[ fdP n 
JE(c) 

e e e 

This means that f fdP n — > f fdP which concludes the proof. □ 
This concludes our discussion on analytical preliminaries for the first part of this paper. 
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